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INTEGER FACTORIZATION OF A POSITIVE-DEFINITE MATRIX 


JOELA.TROPP 


Abstract. This paper establishes that every positive-definite matrix can be written as a positive linear combination of 
outer products of integer-valued vectors whose entries are bounded by the geometric mean of the condition number and 
the dimension of the matrix. 


1. Motivation 


This paper addresses a geometric question that arises in the theory of discrete normal approximation [BLX15] 
and in the analysis of hardware for implementing matrix multiplication [LUW15]. The problem requires us to 
represent a nonsingular covariance matrix as a positive linear combination of outer products of integer vectors. 
The theoretical challenge is to obtain an optimal bound on the magnitude of the integers required as a function of 
the condition number of the matrix. We establish the following result. 


Theorem 1.1. For positive integers m and d, define a set of bounded integer vectors: 

:= {zeZ‘^ :\zi\< mfori -\,...,d}. 

Let A he a real d^ d positive-definite matrix with (finite) spectral condition number 


Jf(A):=AmaxU)/AminU), 

where Amax tind Amin denote the maximum and minimum eigenvalue maps. Every such matrix A can he expressed 
as 


r 

A = ^ aiZiZ* where Zi e and m<\ + - \/(d- 1)■ Jf(A). 


i=l 


The coefficients at are positive, and the number r of terms satisfies r ■< d{d-\- 1)/2. The symbol* refers to the transpose 
operation. 


This result has an alternative interpretation as a matrix factorization: 

A^ZAZ*. 


In this expression, Z is a d x r integer matrix with entries bounded by m. The r x r matrix A is nonnegative and 
diagonal. 

The proof of Theorem 1.1 appears in Section 3. Section 4 demonstrates that the dependence on the condition 
number cannot be improved. We believe that the dependence on the dimension is also optimal, but we did not 
find an example that confirms this surmise. 


2. Notation & Background 

This section contains brief preliminaries. The books [HJ90, Bha97, Bar02, BV04] are good foundational refer¬ 
ences for the techniques in this paper. 

We use lowercase italic letters, such as c, for scalars. Lowercase boldface letters, such as z, denote vectors. 
Uppercase boldface letters, such as A, refer to matrices. We write z; for the ith component of a vector z, and atj 
for the (i, j] component of a matrix A. The jth column of the matrix A will be denoted by aj. 

We work primarily in the real linear space of real d x d symmetric matrices, equipped with the usual compo¬ 

nentwise addition and scalar multiplication: 

H‘^:={A£M'^’''^:A^A*}. 
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Note that has dimension d{d+l] 12. The trace of a matrix A e is the sum of its diagonal entries: 

d 

tr(A) := Y, ^ii- 
/=! 

We equip IHl'^ with the inner product ^ tr(B^) to obtain a real inner-product space. All statements about 

closures refer to the norm topology induced by this inner product. 

Define the set of positive-semidefinite matrices in 

Hf := {Ae IHl'^: M* Am > 0 for each u e 


Similarly, the set of positive-definite matrices is 

{a E : m* Am > 0 for each nonzero u e IR'^}. 

The members of the set -Hf ^ are called negative-definite matrices. 

For a matrix A e H'^, the decreasingly ordered eigenvalues will be written as 

t}(A)>a‘(A)>--->a|^(A). 

SimUarly, the increasingly ordered eigenvalues are denoted as 

A|(A)<Af(A)<---<Aj^(A). 

Note that each eigenvalue map A(-) is positively homogeneous; that is, A(o;A) = aA(A) for all a > 0. 

Let us introduce some concepts from conic geometry in the setting of H'^. A cone is a subset K <zH‘^ that is 
positively homogeneous; in other words, aK = i<r for all a > 0. A convex cone is a cone that is also a convex set. The 
conic hull of a set ij c is the smallest convex cone that contains E: 


coneCi?) 


^ a/Aj: «/ > 0 and A/ e E and r e N 

i=i 


( 2 . 1 ) 


The conic hull of a finite set is closed. Since the space has dimension d[d+\)l2, we can choose the explicit 
value r - d[d -t l)/2 in the expression (2.1). This point follows from a careful application of Caratheodory’s theo¬ 
rem [Bar02, Thm. 1(2.3)]. 

The dual cone associated with a cone c is the set 


K* {B E ; tr(BA) > 0 for each AeK]. 


( 2 . 2 ) 


This set is always a closed convex cone because it is an intersection of closed halfspaces. It is easy to check that 
conic duality reverses inclusion; that is, for any two cones C,K c H'^, 

CcK implies K* cC*. 


Note that we take the relation c to include the possibility that the sets are equal. The bipolar theorem [Bar02, 
Thm. IV(4.2)] states that the double dual [K*]* of a cone K equals the closure of the conic hull of K. 


3. Proof OF Theorem 1.1 

We will establish Theorem 1.1 using methods from the geometry of convex cones. The result is ultimately a 
statement about the containment of one convex cone in another. We approach this question by verifying the 
reverse inclusion for the dual cones. To obtain a good bound on the size of the integer vectors, the key idea is to 
use an averaging argument. 

3.1. Step 1: Reduction to Conic Geometry. Once and for aU, fix the ambient dimension d. First, we introduce the 
convex cone of positive-definite matrices with bounded condition number. For a real number c > 1, define 

Kic) := {AEHf+ :jf(A) < c}. 

The set K(.c) is a cone because the condition number is scale invariant: 7c(aA) = ?c(A) for a > 0. To see that K{c] is 
convex, write the membership condition jcfA) < c in the form 

Amax(-^) ~ C • Ajnin(-A) — 0. 

On the space of symmetric matrices, the maximum eigenvalue is convex, whUe the minimum eigenvalue is con¬ 
cave [BV04, Ex. 3.10]. Since K{c] is a sublevel set of a convex function, it must be convex. 
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Next, select a positive integer m. We introduce a closed convex cone of positive-semidefinite matrices derived 
from the outer products of bounded integer vectors: 

Z{m) := cone{zz*: z e Z^}. 


It is evident that Z(m) is a closed convex cone because it is the conic hull of a finite set. Note that every element of 
this cone can be written as 




where «/ > 0 and z,- e 


d 

m- 


i=l 

By the Caratheodory Theorem, we may take the number r of summands to be r = d(d -i-1) 12. 
Therefore, we can prove Theorem 1.1 by verifying that 


K{c)cZ{m) when 


m > -\/(d- 1) ■ c. 


(3.1) 


Indeed, the formula 1 + | \/(d- 1) -^(^4) in the theorem statement produces a positive integer that satisfies the 
latter inequality when c = kIA). Since the operation of conic duality reverses inclusion and Z(m) is closed, the 
condition (3.1) is equivalent with 


Z[m)* <zK{c)* 


We will establish the inclusion (3.2). 


when ^ - 2 Vid-1]- c. 


(3.2) 


3.2. Step 2: The Dual of kr(c). Our next objective is to obtain a formula for the dual cone K[c)*. We claim that 

f rf 1 5 1 

K[c)* = : E a|(B)> — E^i(®)'^hereAi(B)>0> a‘^j(B) L (3.3) 

[ i=s-H ^ i=l J 

We instate the convention that s= d when B is positive semidefinite. In particular, the set of positive-semidefinite 
matrices is contained in the dual cone: Hf c K{c)*. We also interpret the s = 0 case in (3.3) to exclude negative- 
definite matrices from K{c)*. 

Let us establish (3.3). The definition (2.2) of a dual cone leads to the equivalence 

BeK{c]* if and only if 0< inf tr(BA). 

A€K(c) 

To evaluate the infimum, note that the cone K{c) is orthogonally invariant because the condition number of a 
matrix depends only on the eigenvalues. That is, A e K(c) implies that QAQ* e K(c) for each orthogonal matrix Q 
with dimension d. Therefore, B e K{c]* if and only if 

d 

0< inf inftr(BQAQ*) = inf T x\ {B) ■ xl {A). (3.4) 

AeKic) Q ‘ 

The inner infimum takes place over orthogonal matrices Q . The identity is a well-knovm result due to Richter [Ric58 , 
Satz 1]; see the paper [Mir59, Thm. 1] for an alternative proof. This fact is closely related to (a version of) the 
Hoffman-Wielandt theorem [Bha97, Prob. III.6.15] 

Now, the members of the cone K{c) are those matrices A whose eigenvalues satisfy the bounds 0 < a| {A) and 
X^^{A) <c-X\{A). Owing to the invariance of the inequality (3.4) and the cone K{c) to scaling, we can normalize A 
so that aJ {A) = 1. Thus, the inequality (3.4) holds if and only if 

0 < inf I E a! (B) ■ Pi: 1 = pi < /i 2 ^ ^ ftd < c 

If B is positive semidefinite, then this bound is always true. If B is negative definite, then this inequality is always 
false. Ruling out these cases, let s be the index where Aj (B) > 0 > A*^ j (B) , and observe that 0< s< d. The infimum 
is achieved when we select p; = 1 for i = 1,... s and p,- = c for / = s+l,...,d. In conclusion, 

S d 

BeK{c)* ifandonlyif 0<J^X\{B) + c E '^|(^)- 

/=1 i=s+l 

With our conventions for s = 0 and s = d, this inequality coincides with the advertised result (3.3). 
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3.3. Step 3: The Dual of Z [m). Next, we check that 

Z{m)* - {B : z*Bz > 0 for each z e Z^}. (3.5) 

According to the definition (2.2) of a dual cone, 

Z{m)* -{BeH‘^: tr[BA) > 0 for each Ae Z(m)}. 

Since Z{m] is the conic hull of the matrices zz* where z e Z^, the matrix B e Z{m)* if and only if tr(BA) > 0 for 
each matrix A = zz*. Therefore, 

Z{m)* = {B E : tr(Bzz*) > 0 for each z e Z^}. 

Cycling the trace, we arrive at the representation (3.5). 


3.4. Step 4: Checking Membership. Finally, we need to verify that Z{m)* c K{c)* under suitable conditions on 
the parameters m and c. 

To that end, select a matrix B e Z[m]*. If B is positive semidefinite, then B e K{c)* because K{c)* contains the 
set of positive-semidefinite matrices. It is not possible for B to be negative definite because the expression (3.5) 
forces z*Bz > 0 for each nonzero z e Z^. Therefore, we may exclude these cases. 

Let she the index where A* (B) > 0 > A*_^ ^ (B), and note that 0< s<d. The formula (3.3) indicates that we should 
examine the sum of the d - s smallest eigenvalues of B to determine whether B is a member of K{c]*. This sum of 
eigenvalues can be represented as a trace [HJ90, Eqn. (4.3.20)]: 


Bffi i^ci) .— 


^ a[ (B) = ix[U* BU) where U isadx {d- s) matrix with orthonormal columns. 

( = S+1 

In view of (3.5), we must use the fact that z*Bz > 0 for z e Z'^ to bound the sum of eigenvalues below. 

We will achieve this goal with an averaging argument. For each number a e [-1,1], define an integer-valued 
random variable: 

\ma] with probability ma - [ma} 
lma\ with probability 1 - (ma- Yma\). 

Each of the random variables (a) is supported on {0, +1,..., + m}. Furthermore, EB^ (a) - ma and VarfB^ (a)) < 
j. In other words, we randomly round m a up or down to the nearest integer in such a way that the average value 
is ma and the variance is uniformly bounded. Note that Rm(d) is a constant random variable whenever ma takes 
an integer value. 

We apply this randomized rounding operation to each entry Uij of the matrix U. Let Xheadx (d- s] random 
matrix with independent entries Xij that have the distributions 


X; 


1 

— Rn 


(Uij) for / = 1,...,d and j - l,...,d- s. 


By construction, EX = U andVar(X/y) < 1/(4^“^) for each pair {i,j) of indices. 

Develop the quantity of interest by adding and subtracting the random matrix X: 

d 

^ a}(B) = ti{U*BU) = tr(X*BX) -tr((X- U)*B{X-U))-tr{U*B{X-U))-tr{(X-U)*BU]. 

i=s+l 

Take the expectation over X and use the property EX = {/ to reach 


£ aJ(B) = Etr(X*BX) - Etr((X- U)*B{X-U]). 

i=s+l 

It remains to bound the right-hand side of (3.6) below. 

Expand the trace in the first term on the right-hand side of (3.6): 


(3.6) 


Etr(X*BX) = E 


d-s 

Yx^Bxi 

h ^ ' 


1 

=—E 


d-s 

^ (mxj)*B(mxj) 
J=i 


> 0 . 


(3.7) 


We have written xy for the /th column of X. Each vector mxy belongs to Z^. Since B e Z(m)*, it follows from the 
representation (3.5) of the cone that each of the summands is nonnegative. 
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Next, we turn to the second term on the right-hand side of (3.6). 

d—s d—s d ^ — g d 

EmX-UrB{X-un = E E[(^; - ujrBiXj - Uj]] = Y. E ^ E(^»'^+- 

j=l ;=1/=1 4m 

In the second identity, we applied the fact that the entries of the vector xj - uj are independent, centered random 
variables to see that there is no contribution from the off-diagonal terms of B. The inequality relies on the vari¬ 
ance bound l/(4m^) for each random variable Xij. The function (•)+ : a ^ max{a,0} returns the positive part of a 
number. 

Schur’s theorem [HJ90, Thm. 4.3.26] states that eigenvalues of the symmetric matruc B majorize its diagonal 
entries. Since (■)+ is convex, the real-valued map a >-► on respects the majorization relation [Bha97, 

Thm. II.3.1]. Thus, 

f;(hii)+<E(A|(B))+ = ^Aj(B). 

i=l i=l i=l 

The equality relies on the assumption that the eigenvalues a| (B) become negative at index s -i-1. 

Merging the last two displays, we obtain the estimate 

Etr((Z- U)) < ^ Yx\{B]. (3.8) 

This bound has exactly the form that we need. 

Combining (3.6), (3.7), and (3.8), we arrive at the inequality 

d ^ _ c ^ 

In view of the representation (3.3) of the dual cone K{c]*, the matrix B e K{c)* provided that 

d- s 1 

-> -^ 

4m^ c 

Rearranging this expression, we obtain the sufficient condition 

m > -\/[d- s) ■ c implies BeK{c)*. 

For a general matrix Be Z[m]*, we do not control the index s where the eigenvalues of B change sign, so we must 
insulate ourselves against the worst case, s = 1. This choice leads to the condition (3.2), and the proof is complete. 


4. Optimality 

There are specific matrices where the size of the integers in the representation does not depend on the condition 
number. For instance, let h > 1, and consider the matrix 



h 

0 

= h 

T 

1 

* 

0 

0 

A^ 

0 

1 

0 

0 

+ l 

1 

1 


The condition number k{A} - b, which we can make arbitrarily large, but the integers in the representation never 
exceed one. 

Nevertheless, we can show by example that the dependence of Theorem 1.1 on the condition number is optimal 
in dimension d-2. For a number h > 1, consider the 2x2 matrix 

O ZJ [ij [tj [U I 

From this representation, we quickly determine that the eigenvalues of A are 1 and b^ + 2, so the condition number 
k{A) ^b^ + 2. 

Suppose that we can represent the matrix ^ as a positive linear combination of outer products of vectors in Z^. 
We need at most d(d -i-1) /2 = 3 summands: 


b^ + 1 b 


b 

b 

* 

-H 

1 

0 

b 2 


1 

1 

0 

1 


A- a 


+ 13 


+ 7 


where a,p,j > 0 and Xi,yi,Zi e Z)„. 


(4.1) 


The equations in (4.1) associated with the top-left and bottom-right entries of A read as 

b^ + 1- axf + Pyf + jzf and 2 = -i-+ 7 .Z 2 - 


(4.2) 
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We consider thee cases: (i) all three of X 2 , y 2 ,Z 2 are nonzero; (ii) exactly two of X 2 ,y 2<^2 are nonzero; and (iii) exacdy 
one of X 2 , y 2 , ^2 is nonzero. 

Let us begin with case (i). Since X 2 , y 2 . and Z 2 take nonzero integer values, the second equation in (4.2) ensures 
that 

2 > (a + ;S + 7 )min{x|,y|,z|} > a + p + j- 
Introducing this fact into the first equation in (4.2), we find that 

+ l< [a + p + j) max{x^,yj,Zj} < 2max{xj,yj,Zj}. 


We obtain a lower bound on the magnitude m of integers in a representation of A where X 2 , y 2 , ^2 are all nonzero: 

(4.3) 


m > max 


{|xi|,|yi|,|zi|}> -^i/h2 + l = ^^)f(v4)-l. 
V2 v2 


Since the bound (4.3) is worse than the estimate in Theorem 1.1 for large b, we discover that the optimal integer 
representation of A has at least one zero among X 2 ,y 2 ,Z 2 . 

Next, we turn to case (ii). By symmetry, we may assume that Z 2 = 0. As before, the second equation in (4.2) 
shows that a + (i<2. Meanwhile, the representation (4.1) implies that 


A-j 


[z2 

0 


jzf 

b 

0 

0 


b 

2 


is positive semidefinite. 


Since the determinant of a positive-semidefinite matrix is nonnegative, we find that 0 < 2(^2 + i - jz^) - b^. Equiv¬ 
alently, yZj < i (^2 + 2). The first equation in (4.2) now delivers 

b^ + 1- axj -I- Py^ + yzj < 2max{xj,yj} -i- + 2)- 

It follows that max{x2, y2} > b^/4. We obtain a lower bound on the magnitude m of the integers in a representation 
of A where two of X 2 , y 2 , Z 2 are nonzero: 

m > max{|xi|, |yi |} > - h = -\/k{A') - 2. (4.4) 

In case (iii), a similar argument leads to the same lower bound for m. 

Examining (4.4), we surmise that the bound from Theorem 1.1 

m < 1 -I- - \/{d-l) -xiA), 

on the magnitude m of integers in a representation of A cannot be improved when d = 2 and the condition number 
k{A) becomes large. Considering the d matrix 


A 0 

0 ld-2 


an analogous arguments proves that the dependence of Theorem 1.1 on the condition number is optimal in every 
dimension d. 
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